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Adaptive cache coherency for detecting migratory shared data 
Alan L. Cox, Robert J. Fowler 

May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th 

annual international symposium on Computer architecture, Volume 21 issue 2 

Additional Information: fujl.cjtatjon, abstract, references, citings, index 



Full text available: 



terms 



Parallel programs exhibit a small number of distinct data-sharing patterns. A common data- 
sharing pattern, migratory access, is characterized by exclusive read and write access by 
one processor at a time to a shared datum. We describe a family of adaptive cache 
coherency protocols that dynamically identify migratory shared data in order to reduce the 
cost of moving them. The protocols use a standard memory model and processor-cache 
interface. They do not require any compile-time or run-time ... 

An evaluation of directory schemes for cache coherence 
A. Agarwal, R. Simoni, J. Hennessy, M. Horowitz 

May 1988 ACM SIGARCH Computer Architecture News , Proceedings of the 15th 

Annual International Symposium on Computer architecture, volume 16 issue 2 

Additional Information: full citation, abstract, references , citings, index 



Full text available: Wpdfil ,35.MBj. 



terms 



The problem of cache coherence in shared-memory multiprocessors has been addressed 
using two basic approaches: directory schemes and snoopy cache schemes. Directory 
schemes have been given less attention in the past several years, while snoopy cache 
methods have become extremely popular. Directory schemes for cache coherence are 
potentially attractive in large multiprocessor systems that are beyond the scaling limits of 
the snoopy cache schemes. Slight modifications to directory schemes can ... 

3 An eyajualm 

Anant Agarwai, Richard Simoni, John Hennessy, Mark Horowitz 
August 1998 25 years of the international symposia on Computer architecture 
(selected papers) 

Full text available: pdf(1.31 MB) Additional Information: full citation , references, index terms 



A performance evaiuation of optimal hybrid cache coherency protocols 



http://portd.acm.org/res^ 
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Jack E. Veenstra, Robert J. Fowler 

September 1992 ACM SIGPLAN Notices , Proceedings of the fifth international 

conference on Architectural support for programming languages and 
operating systems, volume 27 issue 9 

Full text available: ^jpdf(1.28 MB) Additional Information: full citation, references, citings, index terms 



5 Boosting the performance of hybrid snooping cache protocols J 
Fredrik Dahlgren 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 
Full text available: |§pdft'123 MBj, Additional Information: MLcMipn, abstract, references, cltinas, index 

Previous studies of bus-based shared-memory multiprocessors have shown hybrid write- 
invalidate/write-update snooping protocols to be incapable of providing consistent 
performance improvements over write-invalidate protocols. In this paper, we analyze the 
deficiencies of hybrid snooping protocols under release consistency, and show how these 
deficiencies can be dramatically reduced by using write caches and read snarfing.Our 
performance evaluation is based on program-driven simulation and a set o ... 

6 SynchmnkatjonM^ ! 

Joonwon Lee, Umakishore Ramachandran 

May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th 

annual international symposium' on Computer Architecture, volume 18 issue 3 
Full text available: pdffl 18 MB) Additional Information: full citation, abstract, references, citings, index 
' ™ - terms 

Introducing private caches in bus-based shared memory multiprocessors leads to the cache 
consistency problem since there may be multiple copies of shared data. However, the ability 
to snoop on the bus coupled with the fast broadcast capability allows the design of special 
hardware support for synchronization. We present a new lock-based cache scheme which 
incorporates synchronization into the cache coherency mechanism. With this scheme high- 
level synchronization primitives as well as low-le ... 
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Efficient synchronization primitives for large-scale cache-coherent multiprocessors 
James R. Goodman, Mary K. Vernon, Philip J. Woest 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the third 
international conference on Architectural support for programming 
languages and operating systems, volume 17 issue 2 

Full text available: f QpdfM.48MB) Additional Information: full citation , abstract, references , citings, Me* 

This paper proposes a set of efficient primitives for process synchronization in 
multiprocessors. The only assumptions made in developing the set of primitives are that 
hardware combining is not implemented in the inter-connect, and (in one case) that the 
interconnect supports broadcast. The primitives make use of synchronization bits (syncbits) 
to provide a simple mechanism for mutual exclusion. The proposed implementation of the 
primitives includes efficient (i.e. 

The sun firepjane system interconnect 
Alan Charlesworth 

November 2001 Proceedings of the 2001 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: W\ ix\f(22A 87 KB) Additional Information: fuHcjtatipn, abstract, references, citings, index 
™ " " terms 

System interconnect is a key determiner of the cost, performance, and reliability of large 
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cache-coherent, shared-memory multiprocessors. Interconnect implementations have to 
accommodate ever greater numbers of ever faster processors. This paper describes the 
Sun™ Fireplane two-level cache-coherency protocol, and its use in the medium and large- 
sized UltraSPARC-III-based Sun Fire™ servers. 

9 C%!P.:..a cache-cg^ 

D. E. Marquardt, H. S. Alkhatib 

August 1989 Proceedings of the 1989 ACM/IEEE conference on Supercomputing 

Full text available - ^pdf(12^ MB) Additional Information: full citation, abstract, references, citings, index 

^ .t©rn?s 

Current research into the problems of cache coherency in multiprocessor (MP) systems, has 
primarily focused on bus based memory interconnection networks (M-ICN) and the use of 
various types of "snooping" cache coherency protocols. Bus bandwidth limitations can be 
alleviated through the use of wider bandwidth general interconnection structures, such as a 
crossbar switch. However, if private caches are used, the cache coherency problem 
becomes mul ... 

10 Cache coherence in large-scale shared multiprocessors: issues and 
comparisons 

David J. Lilja 

September 1993 ACM Computing Surveys (CSUR), volume 25 issue 3 

Full text available: ttpd£3,12. MB). Additional Information: fuJJ.citatjon, references, citings, index terms 



11 Measuring memory hierarchy performance of cache-coherent m uniprocessors using Q 
micro benchmarks 

Cristina Hristea, Daniel Lenoski, John Keen 

November 1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: | f] »xjf(97.47 KB) Additional Information: full citation , abstract , references, citings 

Even with today's large caches, the increasing performance gap between processors and 
memory systems imposes a memory bottleneck for many important scientific and 
commercial applications. This bottleneck is intensified in shared-memory multiprocessors by 
contention and the effects of cache coherency. Under heavy memory contention, the 
memory latency may increase 2 or 3 times. Nonethless, as more sophisticated techniques 
are used to hide latency and increase bandwidth, measuring memory performanc ... 

12 A.dijstnMted §|§ 
S. Mori, H. Saito, M. Goshima, S. fomita, M. Yanagihara, T. Tanaka, D. Fraser, K. Joe, H. Nitta 
December 1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing 

Full text available: ffipdftH7 MB) Additional Information: full citation , references, citings, index terms 



1 3 ■Evaluation of the lock mecb.anjsm in, a 3nogpjng,_cache 

Toshiaki Tarui, Takayuki Nakagawa, Noriyasu Ido, Machiko Asaie, Mamoru Sugie 
August 1992 Proceedings of the 6th international conference on Supercomputing 

Full text available: ^ pdf(1.11 MB) Additional Information: full citation, abstract, references , index terms 

This paper discusses the design concepts of a lock mechanism for a Parallel Inference 
Machine (the PIM/c prototype) and investigates the performance of the mechanism in 
detail. Lock operations are extremely frequent on the PIM; however, lock contention rarely 
occurs during normal memory usage. For this reason, the lock mechanism is designed so as 
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to minimize the lock overhead time in the case of no contention. This is done by using an 
invalidation lock mechanism, which utilizes t ... 

14 Verification of the Futurebus* cache coherence protocol: a case study in model 
checking 

Kylie Williams, Robert Esser 

January 2004 Proceedings of the 27th conference on Australasian computer science - 
Volume 26 

Full text available: ^pdf{175.99 KB) Additional Information: MLcjtatjon, abstract, references 

This paper presents a case study for automatic verification using the Communicating 
Sequential Processes formalism. The case study concerns the Futurebus+ cache coherency 
standard; we develop a formal model of the protocol and perform some verification tasks 
upon it. In the process of doing so, we extend the previous solution by developing a formal 
specification of cache coherence that is suitable for the verification of both directory and 
snooping based cache coherence protocols. 

15 Symmetric Multiprocessing on Programmable Chips Made Easy 
Austin Hung, William Bishop, Andrew Kennings 

March 2005 Proceedings of the conference on Design, Automation and Test in Europe - 
Volume 1 

Full text available: ^ pdf(181.S6 KB? Additional Information: full citation , abstract 

Vendor-provided softcore processors often support advanced features such as caching that 
work well in uniprocessor or uncoupled multiprocessor architectures. However, it is 
achallenge to implement Symmetric Multiprocessor on a Programmable Chip (SMPoPC) 
systems using such processors. This paper presents an implementation of a tightly-coupled, 
cache-coherent symmetric multiprocessing architecture using a vendor-provided softcore 
processor. Experimental results show that this implementation can be ... 

16 Company 

Sarita V. Adve, Vikram S. Adve, Mark D. Hill, Mary K. Vernon 

April 1991 ACM SIGARCH Computer Architecture News , Proceedings of the 18th 

annual international symposium on Computer architecture, Volume 19 issue 3 
Full text available: pdf(1.22 MB) Additional Information: full citation, references, citings , index terms 



17 Design and performance of a coherent cache for parallel logic programming 
architectures 

A. Goto, A. Matsumoto, E. Tick 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the 16th 

annual international symposium on Computer architecture, volume 17 issue 3 
Full text available: f| pcffit 1 7 MBi Add 't'onal Information: Ml citation, abstract references, flfings, index 
' ™ " terms 

This paper describes the design and performance of a tightly-coupled shared-memory 
coherent cache optimized for the execution of parallel logic programming architectures. The 
cache utilizes a copy-back write-allocation protocol having five states and a hardware lock 
mechanism. Optimizations for logic programming are introduced in four software-controlled 
memory access commands: direct-write, exclusive-read, read-purge, and read-invalidate. 
In this paper we describe these operations and pres ... 

18 Verify 

Fong Pong, Michel Dubois 

March 1997 ACM Computing Surveys (CSUR), volume 29 issue 1 
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Full text available: 'Q pdfM.25 MB? Additional Information: full citation, abstract, references , citings, index 

terms 

In this article we present a comprehensive survey of various approaches for the verification 
of cache coherence protocols based on state enumeration, (symbolic model checking, and 
symbolic state models. Since these techniques search the state space of the protocol 
exhaustively, the amount of memory required to manipulate that state information and the 
verification time grow very fast with the number of processors and the complexity of the 
protocol mechanism ... 

Keywords: cache coherence, finite state machine, protocol verification, shared-memory 
multiprocessors, state representation and expansion 



19 STiN G: a CC-NUMA computer system for the commercial marketplace 
Tom Lovett, Russell Clapp 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 
Full text available- f&DdfM 30 MB1 Additional Information: full citation , abstract , references, citings, index 



terms 

"STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor 
designed and built by Sequent Computer Systems, Inc. It combines four processor 
Symmetric Multi-processor (SMP) nodes (called Quads), using a Scalable Coherent Interface 
(SCI) based coherent interconnect. The Quads are based on the Intel P6 processor and the 
external bus it defines. In addition to 4 P6 processors, each Quad may contain up to 4 
GBytes of system memory, 2 Peripheral Component Interface (PCI) busses for ... 



20 A memory management unit and cache controller for the MARS system B 
Feipei Lai, Chyuan-Yow Wu, Tai-Ming Parng 

November 1990 Proceedings of the 23rd annual workshop and symposium on 
Microprogramming and microarchitecture 

Full text available: * ^pdf(1.07 MB) Additional Information: full citation, abstract, references 

For large caches, the interaction between cache access and address translation affects the 
machine cycle time and the access time to memory. The physically addressed caches slow 
down the cache access due to the virtual address translation. The virtually addressed 
caches is faster, but the synonym problem is difficult to handle. By some software 
constraints and hardware support, our virtually addressed physically tagged caches can 
achieve the same speed as traditional virtually addressed cac ... 
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